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Abstract 

The Dobzhansky-Muller model posits that incompatibilities between alleles at different loci 
cause speciation. However, it is known that if the alleles involved in a Dobzhansky-Muller 
incompatibility (DMI) between two loci are neutral, the resulting reproductive isolation cannot 
be maintained in the presence of either mutation or gene flow. Here we propose that speciation 
can emerge through the collective effects of multiple neutral DMIs that cannot, individually, 
cause speciation — a mechanism we call emergent speciation. We investigate emergent speciation 
using a haploid neutral network model with recombination. We find that certain combinations 
of multiple neutral DMIs can lead to speciation. Complex DMIs and high recombination rate 
between the DMI loci facilitate emergent speciation. These conditions are likely to occur in 
nature. We conclude that the interaction between DMIs may be a root cause of the origin of 
species. 



Introduction 

Unravelling the ways in which reproductive barriers between populations arise and are maintained 
remains a central challenge of evolutionary biology. The Dobzhansky-Muller model posits that spe- 
ciation is driven by intrinsic postzygotic reproductive isolation caused by incompatibilities between 
alleles at different loci (Dobzhansky, 1937; Muller, 1942; Orr, 1995). The kinds of strong negative 
epistatic interactions envisioned by this model are common between amino acid substitutions within 
proteins (Kondrashov et al., 2002; Kulathinal et ah, 2004). Furthermore, Dobzhansky-Muller in- 
compatibilities (hereafter DMIs) have been shown to cause inviability or sterility in hybrids between 
closely related species, although the extent to which any particular DMI has actually contributed 
to speciation remains an open question (Presgraves, 2010a, b; Maheshwari and Barbash, 2011; See- 
hausen et ah, 2014). 

In Figure 1A, we illustrate a simple version of the evolutionary scenario originally proposed 
by Dobzhansky (1937) with an incompatibility between neutral alleles at two loci (A and B) in a 
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29 haploid. We refer to this interaction as a neutral DMI. An ancestral population is fixed for the 

30 ab genotype. This population splits into two geographically isolated (allopatric) populations. One 

31 population fixes the neutral allele A at the A locus, whereas the other fixes the neutral allele B at 

32 the B locus. The derived alleles are incompatible: individuals carrying one of the derived alleles are 

33 fit but individuals carrying both of them are not. Upon secondary contact between the populations, 

34 this neutral DMI creates postzygotic isolation between the two populations: if r is the recombination 

35 rate between the loci, then r/2 of haploid Fx hybrids between individuals from the two populations 

36 are unfit (inviable or sterile). 

37 The neutral DMI described in the previous paragraph is unlikely to be an effective mechanism of 

38 speciation because it assumes that the populations diverge in perfect allopatry, and that the derived 

39 alleles go to fixation before secondary contact takes place. However, either mutation or gene flow 

40 can disrupt this process (Barton and Bengtsson, 1986; Bank et al., 2012) (Figures IB and 1C): they 

41 lead to the production of individuals with the ancestral genotype (ab) and these individuals have 

42 an advantage because they are completely compatible with individuals carrying derived alleles (Ab 

43 and aB). 

44 It is known that the reproductive barriers created by neutral DMIs can be strengthened in at least 

45 two ways. First, if selection favors the derived alleles — that is, if the DMI is not neutral (Gavrilets, 

46 1997; Agrawal et al., 2011; Bank et al., 2012). This could happen if the derived alleles are involved 

47 in adaptation to different environments, a scenario known as ecological speciation (Schluter, 2009; 

48 Agrawal et al., 2011). Second, if the two populations are prezygotically isolated. For example, 

49 the low fitness of hybrids can select against hybridization and cause the evolution of assortative 

50 mating between individuals carrying the same derived allele — a mechanism known as reinforcement 

51 (Dobzhansky, 1937; Felsenstein, 1981; Liou and Price, 1994; Servedio and Kirkpatrick, 1997). 

52 Here we consider a new mechanism we call emergent speciation — that speciation emerges through 

53 the collective effects of multiple neutral DMIs that cannot, individually, cause speciation. Low 

54 fitness in hybrids between closely related species is often caused by multiple DMIs (Presgraves, 

55 2003; Payseur and Hoekstra, 2005; Masly and Presgraves, 2007; Matute et al., 2010; Moyle and 

56 Nakazato, 2010; Schumer et al., 2014). However, it does not follow that any of these DMIs actually 

57 caused speciation: most of the DMIs may have accumulated after speciation had occurred by other 

58 means. 

59 The majority of theoretical work on DMIs has relied on either population genetic models 

60 (Nei, 1976; Bengtsson and Christiansen, 1983; Wagner et al., 1994; Gavrilets and Hastings, 1996; 

61 Gavrilets, 1997; Agrawal et al., 2011; Bank et al., 2012), or models of divergence between popula- 

62 tions (Werth and Windham, 1991; Orr, 1995; Lynch and Force, 2000b; Orr and Turelli, 2001; Welch, 

63 2004; Frai'sse et al., 2014). Both classes of models include simplifying assumptions: the former con- 

64 sider only DMIs involving 2-3 loci, whereas the latter ignore polymorphism at the DMI loci. Both 

65 simplifications are problematic: reproductive isolation is often caused by multiple DMIs involving 

66 multiple loci (Presgraves, 2003; Payseur and Hoekstra, 2005; Masly and Presgraves, 2007; Matute 

67 et al., 2010; Moyle and Nakazato, 2010; Schumer et al., 2014), and many populations contain alleles 
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68 involved in DMIs segregating within them (Cutter, 2012; Corbett-Detig et al., 2013). The few studies 

69 that have attempted to overcome these simplifications have either excluded DMIs (Flaxman et al., 

70 2014) or have not represented DMIs explicitly (Barton and Bengtsson, 1986; Gavrilets et al., 1998; 

71 Gavrilets, 1999; Barton and de Cara, 2009) and, therefore, could not capture emergent speciation. 

72 We investigate emergent speciation using a haploid neutral network model (Schuster et al., 1994; 

73 van Nimwegen et al., 1999) with recombination (Xia and Levitt, 2002; Szollosi and Derenyi, 2008), 

74 which allows us to represent DMIs involving multiple loci (Gavrilets and Gravner, 1997; Gavrilets, 

75 2004), and to take into account genetic variation at those loci (Cutter, 2012; Corbett-Detig et al., 

76 2013). 

77 A neutral network (Schuster et al., 1994; van Nimwegen et al., 1999) is a network of fit genotypes 

78 connected by mutational accessibility. Two genotypes are mutationally accessible if one genotype 

79 can be obtained from the other through a single mutation. For example, Figure 1A shows a neutral 

80 network where aB is connected to ab but not to Ab. All genotypes in the network are fit and have 
si equal fitness. All genotypes outside the network are unfit but some may be mutationally accessible 

82 from genotypes in the network. For example, in the neutral network shown in Figure 1A, AB is 

83 unfit, and it is accessible from both aB and Ab, but not ab. 

84 Neutral networks define "holey" adaptive landscapes with "ridges" of fit genotypes connecting 

85 distant genotypes (Gavrilets and Gravner, 1997; Gavrilets, 2004). They extend the neutral DMI 

86 model to multiple loci (Gavrilets and Gravner, 1997; Gavrilets, 2004); a neutral network of K 

87 genotypes with L loci, each with a alleles can be constructed by taking the entire space of a 

88 genotypes and "removing" the a L — K genotypes that carry incompatible combinations of alleles 

89 (e.g., the A and B alleles in the neutral network in Figure 1A). A single DMI of order oj (i.e., 

90 one involving alleles at oj loci) implies the removal of a L ~^ genotypes (2 < oj < L). Additional 

91 DMIs imply the removal of other genotypes, although the corresponding sets of genotypes to remove 

92 may overlap with each other. DMIs of order oj = 2 are designated simple, whereas those of order 

93 oj > 2 are designated complex (Cabot et al., 1994; Orr, 1995; Frai'sse et al., 2014). DMIs of 

94 order up to oj = 5 have been discovered in introgression studies (Frai'sse et al., 2014). The alleles of 

95 genotypes in the neutral network can be, for example, nucleotides, amino acids, insertions/deletions, 

96 or presence/absence of functional genes. Therefore, a neutral network can also be used to represent 

97 DMI-like scenarios such as the degeneration of duplicate genes (Werth and Windham, 1991; Lynch 

98 and Force, 2000b; Nei and Nozawa, 2011). 

99 We show that neutral networks defined by multiple simple and/or complex neutral DMIs can 

100 lead to the establishment of stable reproductive barriers between populations. Although the neutral 

101 network model includes its own simplifying assumptions, it captures the essence of the phenomenon 

102 of emergent speciation in the absence of other possible mechanisms of speciation. Thus, it allows 

103 us to identify and characterize some of the causes of emergent speciation, including the pattern of 

104 interactions between DMI loci and recombination. Furthermore, emergent speciation is a robust 

105 mechanism that we argue should operate under a broad range of conditions. 
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106 Results 

107 A neutral DMI between two loci is not sufficient to cause speciation 

108 Consider the neutral DMI illustrated in Figure 1A. Initially, two allopatric populations are fixed for 

109 the aB and Ah genotypes, respectively. The populations are maximally genetically differentiated 
no at the two loci (Gst = 1)- The degree of reproductive isolation between the two populations is 
in I = r/2, the mean fitness of haploid Fi hybrids between individuals from the two populations (see 

112 Materials and methods for definitions of both Gst and I). 

113 How stable is the reproductive barrier between the two populations? To address this question 

114 we begin by investigating the effect of mutation within populations. If the alleles at each locus 
us can mutate into each other (A-^a and B^b) at a rate u per locus per generation, then the degree 

116 of reproductive isolation will decline exponentially according to the expression: It ~ Iq ■ e~ 2ut , 

117 where t is time in generations, and Iq = r/2 is the initial reproductive isolation. For example, if 
us u = 10 -3 and r = 0.2, then genetic differentiation and reproductive isolation will be eliminated 

119 within ~4,000 generations (Figures IB and 1C, m = 0). Any amount of gene flow between the 

120 two populations will further accelerate the erosion of the reproductive barrier (Figures IB and 1C, 

121 m > 0). For example, if just 1 individual in 2,000 migrates from one population to the other every 

122 generation (m = 0.0005) then genetic differentiation and reproductive isolation will be eliminated 

123 within ~2,000 generations. 

124 The evolution of a stable reproductive barrier between two populations — that is, speciation — 

125 requires the existence of more than one stable equilibrium (Barton, 1996; Gavrilets and Hastings, 

126 1996). A single neutral DMI between two diallelic loci is not sufficient to cause speciation because, 

127 in the presence of mutation (0 < u < 0.5), it only contains one stable equilibrium for any level 

128 of recombination (Gavrilets, 2004), and populations will gradually evolve toward this equilibrium 

129 (Figure 1 — figure supplement 1). Changes to the adaptive landscape can cause the appearance of 

130 two stable equilibria (Bank et al., 2012). For example, if the derived alleles confer an advantage 

131 (fitness: w a B = WAb = 1 and w a b = 1 — s), and if both r and s 3> u, the genotype network will 

132 have two stable equilibria, with p^b ~ 1 and p a B ~ 1, respectively (Figure 2). Two populations in 

133 different equilibria will show a degree of reproductive isolation of: / ~ r(l + s)/2 (Figure 2E). 

134 Neutral networks based on multiple DMIs can show multiple stable equilibria 

135 We began by investigating whether neutral networks contain multiple stable equilibria. To do this 

136 we generated ensembles of 500 random neutral networks of K genotypes with L loci and a alleles 

137 per locus for a range of values of K, L and a. None of the neutral networks considered could 

138 have been specified by a single DMI of any order (2 < u < L). To construct a random neutral 

139 network, we generated K random genotypes with L loci and one of a alleles per locus, and kept the 
wo resulting network if it was connected. We ignored disconnected networks because, although they 
ui often contain multiple stable equilibria, a population is unlikely to shift from one equilibrium to 
U2 another because it requires rare multiple mutations (Gavrilets, 2004). 
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143 For each neutral network, we constructed populations with different initial genotype frequencies 

144 and allowed each population to evolve independently until it reached equilibrium. We then evaluated 

145 the stability of the resulting equilibria (see Materials and methods). No neutral networks defined 

146 on L = 3 loci with a = 2 alleles per locus contain multiple stable equilibria. However, some 

147 neutral networks with L = 4 and a = 2, and with L = 3 and a = 3 contain multiple stable 
us equilibria (Figure 3A; Figure 3 — figure supplement 1A). Populations evolving independently to 

149 different stable equilibria become genetically differentiated and partially reproductively isolated 

150 from each other (Figure 3B-C; Figure 3 — figure supplement 1B-C). Thus, speciation can emerge 

151 through the collective effects of multiple neutral DMIs that cannot, individually, cause speciation. 

152 Larger, sparser neutral networks are more likely to contain multiple stable equi- 

153 libria 

154 The probability, Pm, that a random neutral network from an ensemble shows multiple stable equi- 

155 libria is correlated with properties of the network. Pm increases with the size of the network, K 

156 (Figure 3A; Figure 3 — figure supplement 1A). We have never found any random connected neutral 

157 network with K = 5 genotypes with multiple equilibria (Pm ~ 0), regardless of the values of L and 

158 a. In contrast, networks with K = 9 genotypes defined by L = 6 diallelic loci, show Pm ~ 50%. 

159 For random neutral networks of a given size, the topology of the network also influences Pm- 
wo This is the reason why the relationship between Pm and K is non-monotonic for L = 4 diallelic loci 

161 (Figure 3A): the genotype space consists of only 2 4 = 16 genotypes, which constrains the range of 

162 topologies that a random neutral network can take. Increasing either L or a increases the size of the 

163 genotype space and, therefore, alleviates the constraint (Figure 3A; Figure 3 — figure supplement 

164 1A). 

165 Table 1 shows that Pm is correlated with multiple network properties, including the average 

166 degree of a genotype, that is, the mean number of mutational neighbors it has in the neutral 

167 network. One complication is that different properties of neutral networks are not independent 

168 of each other (Table 1; Figure 3 — figure supplement 3). Figure 3 — figure supplement 2 shows 

169 two network correlates of Pm that are, in turn, uncorrelated with each other (Figure 3 — figure 

170 supplement 3; Table 1): the spectral radius and the degree assort ativity. The spectral radius is 

171 the leading eigenvalue of the adjacency matrix and measures the mean degree of a population at 

172 equilibrium when r = 0 (van Nimwegen et al., 1999). The degree assortativity measures the extent 

173 to which nodes with a certain degree are connected with nodes with similar degree. Neutral networks 

174 with low spectral radius and negative degree assortativity — and more sparsely connected, spread 

175 out, modular networks — are more likely to show multiple stable equilibria (Table 1). However, the 

176 topology of a network is not sufficient to determine Pm'- the precise pattern of linkage between 

177 loci also influences whether a particular neutral network shows multiple stable equilibria (Figure 

178 3 — figure supplement 4). 
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179 Neutral networks based on complex DMIs are more likely to show multiple stable 

lso equilibria 

181 A neutral network of a certain size (K) can be specified by either a few low-order DMIs or many 

182 high-order DMIs. To investigate the extent to which DMIs of different order (cj) can lead to multiple 

183 stable equilibria, we have exhaustively enumerated all possible combinations of simple DMIs (a; = 2) 

184 on L = 4 diallelic loci specifying connected neutral networks with K > 6 genotypes. Of the 2,918 

185 resulting neutral networks, none were found to contain multiple stable equilibria (Pm ~ 0). 

186 This result is surprising because random neutral networks with K = 6 to 12 genotypes with 

187 L = 4 diallelic loci showed Pm ~ 12% (Figure 3A). One possibility is that simple DMIs are not 

188 sufficient to generate neutral networks with multiple stable equilibria, and that complex DMIs 

189 (uj > 2) are required. 

190 To test this hypothesis, we generated additional ensembles of random neutral networks of K = 9 

191 genotypes using random combinations of DMIs of order oj = 2 to 4 between L = 6 diallelic loci. We 

192 found that, although simple DMIs are capable of generating neutral networks with multiple stable 

193 equilibria, ~ 97% of neutral networks generated by combinations of 5-14 simple DMIs have only 

194 one stable equilibrium. As expected, Pm increases with the complexity (oj) of the DMIs (Figure 4). 

195 The existence of multiple stable equilibria depends on the recombination rate 

196 In the absence of recombination between the loci defining a neutral network, there is only one stable 

197 equilibrium (van Nimwegen et al., 1999). The genotype frequencies at equilibrium are given by the 

198 leading eigenvector of the mutation matrix M, where entry My is the mutation rate from genotype 

199 i to genotype j per generation (van Nimwegen et al., 1999). With recombination, however, multiple 

200 stable equilibria can occur (Figure 3A). 

201 To quantitatively investigate the relationship between the existence of multiple stable equilibria 

202 and the recombination rate between fitness loci (r) in a concrete example we considered the neutral 

203 network shown in Figure 5A. This was one of the random neutral networks in the K = 6, L = 3 

204 and a = 3 ensemble summarized in Figure 3 — figure supplement 1. The neutral network is defined 

205 by 10 simple DMIs: A1-B3 (i.e., A\ and B% are incompatible), A2-B2, A2-B3, A3-B2, A3-B3, B\- 

206 C\, B1-C3, -B3-C1, B3-C2, and -B3-C2. We examined how the number of stable equilibria in this 

207 neutral network changes with r while keeping the mutation rate constant (u = 10 -3 ). When the 

208 recombination rate is low (0 < r < 0.0019) the neutral network contains only one stable equilibrium 

209 regardless of initial conditions (Figure 5B). The equilibrium is symmetric in that the frequency 

210 of the A1B2 haplotype (red) is the same as that of the B1C2 haplotype (blue). Above a critical 

211 recombination rate (0.0019 < r < 0.5) there are two stable equilibria and one unstable equilibrium. 

212 Populations evolve to the different equilibria depending on initial conditions (Figure 5C). The stable 

213 equilibria are asymmetric with an excess of genotypes containing either the A1B2 (red) or B1C2 

214 (blue) haplotype, respectively (note, however, that these equilibria are symmetric with each other). 

215 The unstable equilibrium is symmetric, with equal frequencies of the A1B2 and B1C2 haplotypes. 

216 The critical point at which the equilibria bifurcate is approximately invariant with the r/u ratio 
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217 (Figure 5 — figure supplement 1). 

218 The reproductive barriers generated by multiple neutral DMIs can persist in the 

219 presence of gene flow 

220 If two allopatric populations evolve independently to the different stable equilibria of the neutral 

221 network in Figure 5A, they will become genetically differentiated and reproductively isolated to an 

222 extent that also depends on r (Figure 5D-E). 

223 The reproductive barrier created by the neutral network in Figure 5A can persist in the presence 

224 of gene flow (Figures 5D-E, red). Introducing gene flow weakens the degree of genetic differentiation 

225 and of reproductive isolation at equilibrium, and increases the critical value of r required for the 

226 persistence of a reproductive barrier (Figures 5D-E, red). However, the maximum migration rate 

227 between two populations that allows the reproductive barrier to persist is lower than the mutation 

228 rate (m ~ 0.00047, for r = 0.5; Figure 5 — figure supplement 2, blue). Stable differentiation can 

229 occur in a stepping-stone model (Kimura, 1952) with higher local migration rates (Figure 6D), but 

230 the resulting reproductive barrier does not slow down the spread of a neutral allele at an unlinked 

231 locus appreciably (Barton and Bengtsson, 1986) (Table 2). 

232 Larger neutral networks, involving complex incompatibilities between greater numbers of loci, 

233 can generate stronger reproductive barriers, capable of withstanding substantial gene flow. The 

234 neutral network shown in Figure 7A contains three stable equilibria. This was one of the random 

235 neutral networks in the K = 11 and L = 5 ensemble summarized in Figure 3. The neutral network 

236 is defined by 9 DMIs, 7 of which are complex: A-e (i.e., A and e are incompatible), B-e, A-b-D, 

237 a-B-d, A-C-D, B-c-d, b-C-D, C-D-e, and a-c-d-E. Populations at the equilibria at opposite ends 

238 of the network can show high levels of genetic differentiation and reproductive isolation (Figures 7D 

239 and 7E). If the fitness loci are unlinked (r = 0.5), then 50% of Fi hybrids between two populations 

240 at equilibrium are unfit. The maximum migration rate between two populations that allows the 

241 reproductive barrier to persist is almost two orders of magnitude higher than the mutation rate 

242 (m ss 0.0943, for r = 0.5; Figure 5 — figure supplement 2, red). In a stepping-stone model, this 

243 neutral network can slow down the spread of a neutral allele at an unlinked locus to a greater 

244 extent than a single DMI with selection for the derived alleles (Figures 6C and 6E; Table 2). Thus, 

245 emergent speciation could, in principle, be an effective mechanism of either allopatric or parapatric 

246 speciation. 

247 The probability of a stochastic shift from one stable equilibrium to the other 

248 decreases with the recombination rate 

249 In the neutral network model speciation requires that a population undergo a stochastic shift from 

250 one stable equilibrium to another. One mechanism by which this could happen is the "founder effect" 

251 (Templeton, 1980; Carson and Templeton, 1984). In this scenario, a new allopatric population is 

252 founded by a few individuals from a larger source population. The new population then expands 

253 rapidly. The stochastic shift occurs during the short period of time while the population is small. 
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254 We investigated the probability of a stochastic shift in the neutral network shown in Figure 5A (see 

255 Materials and methods). We found that the probability that a founder event causes a stochastic 

256 shift (Ps) can be high when r is low, and declines as r increases (Figure 5F). A similar relationship 

257 between Pg and r was observed for the neutral network in Figure 7A and for a single DMI with 

258 selection for the derived alleles (Figure 7F and 2F). In general, Ps declined as the reproductive 

259 barrier became stronger (Table 2). 

260 Discussion 

261 Our main result is that, when it comes to multiple neutral DMIs, the whole can be greater than 

262 the sum of its parts. Although a single neutral DMI cannot lead to the evolution of stable repro- 

263 ductive isolation, the collective effects of certain combinations of multiple neutral DMIs can lead to 

264 the evolution of strong barriers to gene flow between populations — a mechanism we call emergent 

265 speciation. 

266 Emergent speciation depends on two factors: the pattern of interactions between DMI loci and 

267 recombination. DMIs of higher order (cj), and involving greater numbers of loci (L), tend to promote 

268 emergent speciation (Figures 3 and 4). This relationship is mediated by several properties of the 

269 neutral networks specified by the DMIs: larger (K), more sparsely connected, spread out, modular 

270 neutral networks tend to facilitate emergent speciation (Figure 3; Figure 3 — figure supplement 2; 

271 Table 1). Note that our results are conservative because we considered only connected networks. 

272 Real neutral networks might, in fact, be disconnected (Jimenez et al., 2013) which would be expected 

273 to further facilitate emergent speciation. 

274 Increasing the recombination rate between DMI loci promotes emergent speciation in at least 

275 three ways. First, it causes the appearance of multiple equilibria (Figures 5B-C and 7B-C). Re- 

276 combination had been shown to generate multistability in other evolutionary models (Biirger, 1989; 

277 Bergman and Feldman, 1992; Boerlijst et al., 1996; Higgs, 1998; Wright et al., 2003; Jacobi and 

278 Nordahl, 2006; Park and Krug, 2011), although earlier studies of the evolutionary consequences of 

279 recombination in neutral networks did not detect multiple equilibria (Xia and Levitt, 2002; Szollosi 

280 and Derenyi, 2008). Second, it increases genetic differentiation between populations at the different 

281 equilibria (Figures 5D and 7D). This pattern is consistent with the observation that increasing r 

282 reduces variation within a population at equilibrium in a neutral network (Xia and Levitt, 2002; 

283 Szollosi and Derenyi, 2008; Paixao and Azevedo, 2010). Third, it increases the degree of repro- 

284 ductive isolation between populations at different equilibria (Figures 5E and 7E). This is because, 

285 in our model, recombination is required to produce hybrids and consequently is the predominant 

286 source of selection. High r between fitness loci has been shown to promote speciation in other 

287 models (Felsenstein, 1981; Bank et al., 2012). 

288 The precise pattern of recombination — that is, linkage — between loci can also determine the 

289 existence of multiple equilibria (Figure 3 — figure supplement 4). This result indicates that certain 

290 chromosomal rearrangements may facilitate emergent speciation. Note that this mechanism of chro- 
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291 mosomal speciation does not assume that different chromosomal rearrangements are polymorphic 

292 within populations and therefore is not based on suppression of recombination (Faria and Navarro, 

293 2010). 

294 How likely is emergent speciation to occur in nature? One recent study (Corbett-Detig et al., 

295 2013) found evidence that multiple simple DMIs involving loci with high r are currently segregating 

296 within natural populations of Drosophila melanogaster. Corbett-Detig and colleagues surveyed 

297 a large panel of recombinant inbred lines (RILs) (Corbett-Detig et al., 2013). They found 22 

298 incompatible pairs of alleles at unlinked loci in the RILs; of the 44 alleles, 27 were shared by two 

299 or more RILs, indicating that multiple DMIs are polymorphic within natural populations (Corbett- 

300 Detig et al., 2013). They also found evidence for multiple DMIs in RIL panels in Arabidopsis and 

301 maize (Corbett-Detig et al., 2013). Corbett-Detig and colleagues did not attempt to identify DMIs 

302 among linked loci or complex DMIs and therefore are likely to have underestimated the actual 

303 number and complexity of DMIs in the RILs. These observations suggest that the conditions for 

304 emergent speciation by multiple DMIs may indeed occur in nature, although the resulting neutral 

305 networks remain to be discovered. 

306 There is strong evidence that DMIs contribute to reproductive isolation between closely related 

307 species, but it is difficult to determine the extent to which these DMIs actually caused speciation 

308 or are simply a by-product of divergence after speciation had occurred by other means (Presgraves, 

309 2010a, b; Maheshwari and Barbash, 2011; Seehausen et al., 2014). One prediction of the emergent 

310 speciation hypothesis is that, if multiple DMIs contribute to speciation then DMIs fixed between 

311 species should have higher order (u), on average, than DMIs segregating within species. Recent 

312 surveys have concluded that complex DMIs, as well as other forms of high-order epistasis, are 

313 widespread (Presgraves, 2010a; Weinreich et al., 2013; Frai'sse et al., 2014), but a systematic com- 

314 parison between the complexity of DMIs in divergence and polymorphism remains to be carried 

315 out. 

316 The neutral network model includes two central assumptions: neutrality within the network and 

317 complete unfitness outside it. Both assumptions are plausible in the case of speciation by reciprocal 

318 degeneration or loss of duplicate genes. Gene duplication followed by reciprocal degeneration or 

319 loss of duplicate copies in different lineages can act just like a DMI (Werth and Windham, 1991; 

320 Lynch and Force, 2000b), despite not involving an epistatic interaction (Nei and Nozawa, 2011). 

321 If the duplicates are essential genes, then genotypes carrying insufficient functional copies will be 

322 completely unfit. Gene duplications, degenerations and losses are common (Force et al., 1999; 

323 Lynch and Conery, 2000; Nei and Nozawa, 2011) and a substantial fraction of gene degenerations 

324 and losses are likely to be effectively neutral (Force et al., 1999; Lynch and Force, 2000a; Lynch and 

325 Conery, 2000). Following whole genome duplications, multiple gene degenerations or losses occur 

326 (Force et al., 1999; Lynch and Force, 2000a; Scannell et al., 2006; Nei and Nozawa, 2011), and the 

327 duplicates tend to be unlinked. Thus, we predict that emergent speciation will play a major role in 

328 speciation by reciprocal degeneration or loss of duplicate genes. This form of speciation appears to 

329 have contributed to the diversification of yeasts (Scannell et al., 2006). 
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330 The assumption of "in-network" neutrality is challenged by evidence that many DMI loci have 

331 experienced positive selection during their evolutionary history (Presgraves, 2010a, b; Maheshwari 

332 and Barbash, 2011). However, the neutral network model could still apply to some of those cases for 

333 two reasons. First, emergent speciation is robust to some variation in fitness among the genotypes 

334 in a neutral network (Figure 5 — figure supplement 3). Second, neutral networks may approximate 

335 more complex scenarios where selection is weak or variable over time and/or space, or population 

336 sizes are small (Gavrilets, 2004). 

337 The assumption that "out-of-network" genotypes are completely unfit is contradicted by the 

338 observation that many DMIs cause only partial loss of fitness (Presgraves, 2003; Corbett-Detig 

339 et al., 2013; Schumer et al., 2014). However, our results also apply to partial DMIs. As long 

340 as the the disadvantage of "falling off" the neutral network is substantial, partial DMIs are still 

341 expected to lead to the evolution of stable — albeit weaker — reproductive barriers (Figure 5 — figure 

342 supplement 4). We conclude that emergent speciation is a robust mechanism that should operate 

343 under a broader range of conditions violating the two central assumptions of the neutral network 

344 model. 

345 The best studied examples of DMIs are in diploids (Presgraves, 2010b; Maheshwari and Barbash, 

346 2011). Our model assumes haploidy, which means that it is mathematically equivalent to a diploid 

347 model where the incompatible haplotypes cause dominant incompatibilities, but where the same 

348 diploid genotypes involving cis and trans allele combinations (e.g., Ab/aB and ab/AB) may have 

349 different fitnesses. The latter is rare, and the former is unrealistic: DMIs in diploids tend to 

350 be recessive (Presgraves, 2003; Masly and Presgraves, 2007). Nevertheless, diploidy is likely to 

351 facilitate emergent speciation for three reasons. First, segregation in diploids has many of the same 

352 consequences as high recombination in haploids, regardless of the rate of recombination among 

353 linked loci (Otto, 2003). Second, diploids can show much stronger reproductive isolation than 

354 haploids. Strong reproductive isolation in haploids requires that a large proportion of recombinants 

355 carry incompatible combinations of alleles. This can only be achieved with large numbers of DMI 

356 loci and high recombination rate between them. In contrast, single DMIs can cause dramatic loss 

357 of fitness in Fi hybrids in diploids (Presgraves, 2010b; Maheshwari and Barbash, 2011). Third, 

358 diploidy may allow patterns of DMI interaction that increase the probability of stochastic shifts 

359 between stable equilibria (Wagner et al., 1994; Gavrilets, 2004). 

360 Recombination does oppose emergent speciation in neutral networks in one crucial way: it 

361 reduces the probability of a stochastic shift (Ps) between stable equilibria (Figures 5F and 7F). Ps 

362 also appears to increase with the strength of the reproductive barrier at equilibrium (Figures 5F and 

363 7F). Similar observations have been made in other models (Figure 2F) (Wagner et al., 1994; Barton, 

364 1996; Gavrilets, 2004), leading many to conclude that genetic drift alone cannot cause speciation 

365 (Barton, 1996; Seehausen et al., 2014). It does not follow, however, that emergent speciation is 

366 unlikely. Shifts between stable equilibria might be facilitated by transient changes in selection 

367 (Barton, 1996). Alternatively, populations could diverge in allopatry as envisaged in traditional 
ses DMI models (Dobzhansky, 1937; Muller, 1942; Orr, 1995). 
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369 Our results have broader implications for evolutionary theory. The neutral network model was 

370 originally developed in the context of RNA and protein sequence evolution (Lipman and Wilbur, 

371 1991; Schuster et al., 1994; Huynen et al., 1996), and has played an important role in the study of 

372 the evolution of robustness and evolvability (Huynen et al., 1996; van Nimwegen et al., 1999; Ancel 

373 and Fontana, 2000; Wagner, 2008; Draghi et al., 2010). One limitation of much of this work is that 

374 it has been conducted using the asexual version of the neutral network model. Our finding that 

375 recombination promotes the appearance of multiple stable equilibria in neutral networks has clear 

376 implications for the evolution of robustness and evolvability that deserve further investigation. For 

377 example, Wagner (2011) has argued that recombination helps explore genotype space because it 

378 causes greater genotypic change than mutation. However, our results suggest that, depending on 

379 the structure of the neutral network, large sexual populations can get trapped in stable equilibria, 

380 therefore restricting their ability to explore genotype space. 

381 We have found that multiple neutral DMIs can cause emergent speciation and that the conditions 

382 that promote emergent speciation are likely to occur in natural populations. We conclude that 

383 the interaction between DMIs may be a root cause of the origin of species. Continued efforts to 

384 detect DMIs (Payseur and Hoekstra, 2005; Masly and Presgraves, 2007; Schumer et al., 2014) and to 

385 reconstruct real neutral networks (Lee et al., 1997; Jimenez et al., 2013) will be crucial to evaluating 

386 the reality and importance of emergent speciation. 

387 Materials and methods 

388 Neutral network model 

389 Organisms are haploid and carry L loci with effects on fitness. Each locus can have one of a alleles. 

390 Out of the possible a genotypes, K are fit, with equal fitness, and the remaining genotypes are 

391 completely unfit. The K genotypes define a neutral network, where genotypes are connected if one 

392 genotype can be obtained from the other through a single mutation (i.e., they differ at a single 

393 locus) . 

394 Random neutral networks 

395 Ensembles of random neutral networks were analyzed. Random neutral networks were generated by 

396 sampling K genotypes at random from the a L possible genotypes available (without replacement) 

397 and retaining the resulting network if it was connected. 

398 Neutral networks specified by DMIs 

399 To investigate the effect of the order (u) of a DMI, ensembles of neutral networks were generated 

400 by sampling combinations of d random DMIs with pre-specified values of u between alleles at L 

401 diallelic loci (see Figure 4 for more details). Following Orr (Orr, 1995), one allele at each locus was 
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402 considered to be ancestral and compatible with other ancestral alleles, and no DMIs were allowed 

403 where all the uj incompatible alleles were ancestral. 

404 Network statistics 

405 Algebraic connectivity Second smallest eigenvalue of the Laplacian matrix of the network (New- 

406 man, 2010). Abbreviated as AC in Figure 3 — figure supplement 3. Calculated using NetworkX 

407 (Hagberg et al, 2008). 

408 Average degree and variance in degree Mean and variance of the degree distribution, re- 

409 spectively (Newman, 2010). The degree of a genotype is the number of its fit mutational neighbors. 

410 Calculated using NetworkX (Hagberg et al., 2008). 

411 Average Hamming distance Average number of loci at which pairs of genotypes carry different 

412 alleles. Genotypes connected in the neutral network are at a Hamming distance of 1. 

413 Average shortest path length Average number of steps along the shortest path between pairs 
4u of genotypes (Newman, 2010). Abbreviated as PL in Figure 3 — figure supplement 3. Calculated 

415 using NetworkX (Hagberg et al., 2008). 

416 Degree assortativity A measure of the correlation of the degree of linked genotypes (Newman, 

417 2010). Abbreviated as DA in Figure 3 — figure supplement 3 and Table 1. Calculated using Net- 

418 workX (Hagberg et al., 2008). 

419 Estrada index A centrality measure (Estrada and Rodriguez- Velazquez, 2005). Calculated using 

420 NetworkX (Hagberg et al., 2008). 

421 Modularity A measure of the extent to which the network displays community structure (New- 

422 man, 2006). A community is a group of densely interconnected nodes showing relatively few con- 

423 nections to nodes outside the community. Abbreviated as Q in Figure 3 — figure supplement 3. 

424 Calculated using igraph (Csardi and Nepusz, 2006) based on an exhaustive search over all possible 

425 partitions of the network. 

426 Spectral radius Leading eigenvalue of adjacency matrix (Newman, 2010). Measures the mean de- 

427 gree of a population at equilibrium in the absence of recombination (van Nimwegen et al., 1999). Ab- 

428 breviated as SR in Figure 3 — figure supplement 3 and Table 1. Calculated using NumPy (Oliphant, 

429 2007). 
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430 Evolution 

431 Evolution on a neutral network was modeled by considering an infinite-sized population of haploid 

432 organisms reproducing sexually in discrete generations. The state of the population is given by 

433 a vector of frequencies p = (po,pi, ■ ■ ■ ,Pk), where pi is the frequency of genotype i. Genotypes 

434 outside the network are ignored because they are completely inviable (van Nimwegen et al., 1999). 

435 Individuals mate at random with respect to genotype to form a transient diploid that undergoes 

436 meiosis to produce haploid descendants. Selection takes place during the haploid phase. Mating, 

437 recombination, mutation and selection cause the population to evolve according to the equation: 

^ t+1 = K IY-> ~> Sr\ r ' W 

Y$U \(fft.ff).M . 



where p t is the state of the population at generation t, M is the mutation matrix such that entry 
Mij is the mutation rate from genotype i to genotype j per generation, and R = (R°, R 1 , . . . R^) 



is a vector of recombination matrices such that entry Rf • of matrix R 9 is the probability that 



439 
440 

441 a mating between individuals of genotypes i and j generates an individual offspring of genotype 

442 g. The diagonal elements of M (Ma) represent the probability that genotype i does not mutate 

443 (including to unfit genotypes outside the neutral network). Values of are set by assuming that 

444 each locus mutates with probability u and that a genotype can only mutate simultaneously at up 

445 to a certain number of loci. Up to L — 1 crossover events can occur between two genotypes with 

446 probability 0 < r < 0.5 per interval. The recombination rate r is the same for all pairs of adjacent 

447 loci. If r = 0.5, then there is free recombination between all loci. 



448 Equilibria 

449 Given a neutral network, population genetic parameters u and r, and a set of initial genotype 

450 frequencies po, the population was allowed to evolve until the root-mean-square deviation of the 

451 genotype frequencies in consecutive generations was RMSD(^ 4 ,^i_i) < 10~ 9 . The final genotype 

452 frequencies were identified as an equilibrium p. Multiple initial conditions were used: (i) fixed for 

453 each of the K genotypes in turn, and (ii) 4 independent sets of random frequencies. Two equilibria 

454 {pi and pj) were judged identical if RMSD(pi,Pj) < 3 x 10~ 4 . Only one of a set of identical 

455 equilibria was counted. This procedure does not guarantee the discovery of all equilibria; indeed, it 

456 likely underestimates the number of unstable equilibria. 



457 Stability analysis 

458 For each equilibrium p, the eigenvalues of the Jacobian matrix of pt+i (Ai, A2, • • . , Xk) were calcu- 

459 lated at p. If I Ai| < 1 for every i (to within a tolerance of 10 -8 ), the equilibrium was judged to be 

460 stable. 
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461 Gene flow 

462 Gene flow was modeled as symmetric migration between two populations. Migration occurs at 

463 the beginning of each generation, such that a proportion m of each population is composed of 

464 immigrants from the other population. Then random mating, recombination and mutation take 

465 place within each population, as described above. 

466 Stepping-stone model 

467 A stepping-stone model (Kimura, 1952) was used to measure the rate of spread of a neutral allele 

468 across a reproductive barrier (Barton and Bengtsson, 1986). A number n of populations are arranged 

469 in a line. Every generation a proportion 2m of a population emigrates to its two neighboring 

470 populations (except populations 1 and n, which have only one neighbor, so only m of each of 

471 them emigrate) (see Figure 6A). Note that, unlike the stepping stone model studied by Gavrilets 

472 (Gavrilets, 1997), our implementation allows the genotype frequencies of terminal populations (1 

473 and n) to vary. When n = 2 this model reduces to the gene flow model described in the previous 

474 section. 

475 Genetic differentiation 

476 Gst = 1 — Hs/Ht was used to measure the genetic differentiation between two populations at a 

477 locus, where Hg is the average gene diversity of the two populations, and Ht is the gene diversity 

478 of a population constructed by pooling the two populations (Nei, 1976). The gene diversity of a 

479 population at a locus is defined as H = 1 — Ylt=i ih where qi is the frequency of allele i. Values of 

480 Gst can vary between 0 (two populations with the same allele frequencies) and 1 (two populations 

481 fixed for different alleles). The overall genetic differentiation between two populations was quantified 

482 as the average Gst over all loci. If all genotypes in the neutral network contain the same allele at 

483 a locus, that locus is excluded from the calculation of average Gst- 

484 Reproductive isolation 

485 The degree of reproductive isolation is defined as (Barton, 1996; Palmer and Feldman, 2009): 1= 1— 

486 wh /ws, where wh is the mean fitness of haploid Fi hybrid offspring from crosses between individuals 

487 from the two populations, and ws is the average of the mean fitnesses of the individual populations. 

488 The calculation of wh and ws only takes into account the contribution of recombination, and ignores 

489 mutation. Values of / can vary between 0 (the populations are undifferentiated or r = 0) and 1 (all 

490 Fi hybrids are unfit). 

491 Founder effect speciation 

492 To simulate a founder event (Templeton, 1980; Carson and Templeton, 1984), a new population is 

493 founded from a sample of Nq individuals from an infinite-sized population at one stable equilibrium. 

494 The population is then allowed to grow according to the equation Nf = X 1 Nq, where t is the 
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495 generation number and A is the finite rate of increase. At generation t, the expected vector of 

496 genotype frequencies p t is calculated using equation (1) and a random sample of size Nt is drawn 

497 from a multinomial distribution with probabilities pf Once the population reaches Nt > 10 4 

498 individuals, it is allowed to evolve deterministically to equilibrium. If the population evolves to a 

499 different equilibrium from that of the source population, it is counted as a shift. 

500 For the adaptive landscape in Figure 2A, every simulation run was evolved to equilibrium. 

501 For the neutral network in Figure 5A, only simulation runs where at least one of the Nq founder 

502 individuals carried the A1B2 haplotype were evolved to equilibrium. Similarly, for the neutral 

503 network Figure 7A, only simulation runs where at least one of the Nq founder individuals carried 

504 either the D allele or the abde haplotype were evolved to equilibrium. Thus, the estimates of the 

505 probability that a founder event causes a stochastic shift (Ps) for the neutral networks in Figures 

506 5A and 7A slightly underestimate the true value because mutations occurring early during the 

507 population expansion phase could cause a shift. 

To estimate the probability that a founder event causes a stochastic shift (Ps) as many tries (u) 
as required to get a successful shifts were run. The following unbiased estimator was used: 

Ps 



a + u-1 



508 95% confidence intervals were calculated by parametric bootstrapping: for each estimate of Ps, 10 6 

509 random samples of a values from the negative binomial distribution with probability of success Ps 

510 were generated and Ps was recalculated; the confidence intervals were estimated as the 2.5% and 

511 97.5% quantiles of the distribution of simulated Ps values. 
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Figure 1. A neutral DMI between two loci is not sufficient to cause speciation. (A) Three 
haploid genotypes (closed circles) are fit and have equal fitness, and one genotype is unfit (open 
circle). The K = 3 fit genotypes form a neutral network. (B— C) The reproductive isolation 
generated by a neutral DMI between two diallelic loci does not persist in the face of either mutation 
or gene flow because there is only one stable equilibrium (Figure 1 — figure supplement 1). Two 
populations start fixed for the Ah and aB genotypes, respectively, and are allowed to evolve with a 
mutation rate of u = 10~ 3 per locus per generation, a recombination rate of r = 0.2 between loci, 
and in the presence of different levels of gene flow (m, proportion of a the population consisting 
of migrants from the other population, each generation). Genotypes are only allowed to mutate 
at one locus per generation. (B) Genetic differentiation (Gst) and (C) degree of reproductive 
isolation (I) between the two populations (see Materials and methods). Initially, Gst = 1 an d 
J 0 = r /2 = 0.1. The dashed line in (C) shows h = h ■ e~ 2ut . For raw data, see data/fig_l/; for 
code, see ipython/fig.l.ipynb (Dryad: Paixao et al., 2014). 
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Figure 2. Selection for derived alleles in a single DMI increases the number of stable equilibria. 
(A) Fitness landscape generated by a single DMI between L = 2 diallelic loci with selection for 
derived alleles, s measures the strength of selection against the ab genotype. This is an example 
of a "mutation-order" model (Schluter, 2009) because it assumes that the different populations 
experience the same environment. Note that if s = 1, the ab genotype is lethal and the model 
reduces to a disconnected neutral network with two genotypes: aB and Ab. (B— C) If both r and 
s 3> u, then there are two stable equilibria, one with pAb ~ 1 and the other with p a s ~ 1. (B) 
Weak recombination and selection (r = 0.002 , s = 0.05). (C) Strong recombination and selection 
(r = 0.5 , s = 0.2). Both the genetic differentiation (D) and the reproductive isolation (E) among 
populations at the two stable equilibria increases with both s and r. The degree of reproductive 
isolation is well approximated by I ~ r(l + s)/2. (F) The probability that a founder event causes 
a stochastic shift (P<?) from one stable equilibrium to the other decreases with both s and r. A new 
population was founded by taking a sample of iVo = 2 individuals from a population at the blue 
equilibrium, and was allowed to double in size every generation (A = 2, see Materials and methods). 
Ps was calculated based on 100 successful shifts to the red equilibrium. The shaded area shows 
the 95% confidence interval of each data point. The dashed line shows 0.002/r. Population genetic 
parameters: u = 10 -3 per locus; genotypes are allowed to mutate at both loci per generation. For 
raw data, see data/fig_2/; for code, see ipython/fig_2.ipynb (Dryad: Paixao et al., 2014). 



23 



Downloaded from http://biorxiv.org/on September 18, 2014 



ABC 




4 6 8 10 12 6 8 10 12 6 8 10 12 

Number of genotypes Number of genotypes Number of genotypes 



Figure 3. Larger neutral networks are more likely to contain multiple stable equilibria. (A) Prob- 
ability that random neutral networks containing K genotypes of L diallelic loci contain multiple 
stable equilibria (Pm)- Values are estimates based on ensembles of 500 random connected neutral 
networks for each combination of K and L. None of the neutral networks could have been specified 
by a single DMI of any order (2 < to < L). Genetic differentiation (B) and degree of reproductive 
isolation (C) between populations at different equilibria. In (B C) values are means for neutral 
networks containing two or more stable equilibria (if a neutral network contained more than two 
stable equilibria, the maximum pairwise Gst and / were used). All error bars are 95% confidence 
intervals. Population genetic parameters: r = 2 (L-i) between adjacent loci; up to L — 1 crossovers 
are allowed between two genotypes per generation; u = r/20 per locus per generation; genotypes 
are only allowed to mutate at L — 2 loci per generation. Pm is affected by the number of alleles per 
locus (a) (Figure 3 — figure supplement 1), by network properties of the neutral networks (Table 
1; Figure 3 — figure supplement 2; Figure 3 — figure supplement 3) and by the pattern of recombi- 
nation between sites (Figure 3 — figure supplement 4). For raw data, see data/fig_3/; for code, see 
ipython/fig_3.ipynb (Dryad: Paixao et al., 2014). 
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Figure 4. Higher-order incompatibilities increase the probability that the resulting neutral network 
shows multiple stable equilibria. Values are probabilities that neutral networks show multiple stable 
equilibria (Pm)- Each estimate is based on an ensemble of 500 random connected neutral networks 
of K = 9 genotypes generated from random combinations of DMIs of a given order (w) between 
alleles at L = 6 diallelic loci. Integer values of u indicate that the neutral networks are specified 
entirely by d DMIs of order u. Values of u of the form <ft+ 1 /2 indicate that that the neutral networks 
are specified by d/2 DMIs of order cj) and d/2 DMIs of order <f> + 1 (d even). The value of d in each 
neutral network in an ensemble was drawn at random from a broad uniform distribution. Then d 
DMIs with certain u were sampled without replacement. A neutral network was then generated 
from these DMIs and retained for further analysis if it had K = 9. Error bars are 95% confidence 
intervals. Population genetic parameters: r = 2(L-i) between adjacent loci; up to L — 1 crossovers 
are allowed between two genotypes per generation; u = r/20 per locus per generation; genotypes 
are only allowed to mutate at two loci per generation. For raw data, see data/fig_4/; for code, see 
ipython/fig_4.ipynb (Dryad: Paixao et al., 2014). 
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Figure 5. Increasing the recombination rate between DMI loci promotes emergent speciation. (A) 
Neutral network of K = 6 genotypes including all genotypes containing either the A\B<i haplotype 
(red) or the B\Ci haplotype (blue). Genotypes are represented by closed circles. Solid lines connect 
genotypes differing at a single locus. The colors relate to the equilibria as explained below. (B— C) 
The presence of multiple stable equilibria in the neutral network shown in (A) depends on the 
recombination rate (r). Populations initially fixed for a genotype shown in a given color, evolve to 
a stable equilibrium shown in the same color. (B) When the recombination rate is low (r = 10 -3 ), 
populations initially fixed for any of the genotypes in the neutral network evolve to the same stable 
equilibrium. (C) With higher recombination rate (r = 10 -2 ), populations initially fixed for any 
of 3 genotypes shown in red evolve to the stable equilibrium in red, whereas populations initially 
fixed one of the 3 genotypes shown in blue evolve to the stable equilibrium in blue. Populations 
showing perfectly even genotype frequencies evolve to the unstable equilibrium in green. The 
critical value of r at which the equilibria bifurcate depends on u (Figure 5 — figure supplement 1). 
Both the genetic differentiation (D) and the reproductive isolation (E) among populations at the 
two stable equilibria increase with r, and can persist in the presence of weak gene flow (see also, 
Figure 5 — figure supplement 2), changes in the fitness of genotypes in the neutral network (Figure 
5 — figure supplement 3), and increases in the fitness of genotypes outside the neutral network 
(Figure 5 — figure supplement 4). The blue points are well approximated by I = r/2, the maximum 
reproductive isolation attainable with one neutral DMI (Figure 1). (F) Probability that a founder 
event causes a stochastic shift (Ps) from one stable equilibrium to the other decreases with r. A 
new population was founded by taking a sample of iVo = 2 individuals from a population at the 
blue equilibrium, and was allowed to double in size every generation (A = 2, see Materials and 
methods). Ps was calculated based on 100 successful shifts to a red equilibrium. The shaded area 
shows the 95% confidence interval of each data point. The dashed line shows 0.002/r. Population 
genetic parameters: u = 10 -3 per locus; genotypes are allowed to mutate at all loci per generation; 
up to two crossovers were allowed between two genotypes. For raw data, see data/fig_5/; for code, 
see ipython/fig_5.ipynb (Dryad: Paixao et al., 2014). 
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Figure 6. Neutral DMIs can generate strong reproductive barriers. (A) Stepping-stone model. A 
number n of populations are arranged in a line. Every generation a proportion 2m of a population 
emigrates to its two neighboring populations (except populations 1 and n, which have only one 
neighbor, so only m of each of them emigrate) (see Materials and methods for more details). (B) 
Spread of a neutral allele over n = 20 populations. Locus A has two neutral alleles, A and a. 
Initially, populations 1-10 are fixed for the a allele and populations 11-20 are fixed for the A allele 
(blue). We allow the populations to evolve with m = 0.025, without mutation at the A locus. 
The green points show the frequencies of the A allele after t = 500 generations. After t = 1515 
generations, the frequency of A has increased from 0 to 25% in population 1 (Tjy in Table 2). 
Eventually, the population will reach equilibrium with each neutral allele at a frequency of 50% in 
every population (dashed line). (C— E) Equilibrium allele or genotype frequencies in a hybrid zone 
formed after the contact of two populations initially at different stable equilibria on opposite ends 
of the line. See Table 2 for analysis of flow of unlinked neutral alleles through these hybrid zones. 
(C) DMI with selection (Figure 2). Initially, populations 1-10 are fixed for the Ab genotype and 
populations 11-20 are fixed for the aB genotype. The points show the equilibrium frequencies of 
aB for r = 0.5 and m = 0.025 and different values of s. (D) Neutral network shown in Figure 
5. Initially, populations 1-10 are fixed for the B1C2 haplotype and populations 11-20 are fixed 
for the A1B2 haplotype. The points show the equilibrium frequencies of the A\B<i haplotype for 
r = 0.5 and m = 0.025. (E) Neutral network shown in Figure 7. Initially, populations 1-10 are 
fixed for the d allele and populations 11-20 are fixed for the D allele (a rough marker for the red 
equilibrium in Figure 7). The points show the equilibrium frequencies of the D allele for r = 0.5 
and m = 0.025. Population genetic parameters in (C— E): u = 10 -3 per locus; genotypes are only 
allowed to mutate at one locus per generation; up to L — 1 crossovers are allowed between two 
genotypes per generation. For raw data, see data/fig_2/, data/fig_5/ and data/fig_7/; for code, see 
ipython/fig_6.ipynb (Dryad: Paixao et al., 2014). 
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Figure 7. Multiple neutral DM incompatibities can generate substantial reproductive barriers. 
(A) Neutral network of K = 11 genotypes generated by 9 DMIs between L = 5 diallelic loci. 
Genotypes are represented by closed circles. Solid lines connect genotypes differing at a single 
locus. The colors relate to the equilibria as explained below. (B— C) The presence of multiple 
stable equilibria in the neutral network shown in (A) depends on the recombination rate. When 
the recombination rate is low (0 < r < 3.6 x 10~ 4 ), populations initially fixed for any of the 11 
genotypes in the neutral network evolve to the same stable equilibrium; when the recombination rate 
is higher (3.6 x 10~ 4 < r < 5.3 x 10~ 3 ), there are two stable equilibria; when the recombination rate is 
higher still (5.3 x 10~ 3 < r < 0.5) there are three stable equilibria. (B) When r = 10 -3 , populations 
initially fixed for any of 7 genotypes shown in red evolve to the stable equilibrium in red, whereas 
populations initially fixed one of the 5 genotypes shown in green evolve to the stable equilibrium 
in green. (C) When r = 10 -2 , populations initially fixed for genotypes shown in red, green or blue 
(same colors in (A)), evolve to the equilibrium of the same color. Both the genetic differentiation 
(D) and the reproductive isolation (E) among populations at the red and green equilibria increases 
with the recombination rate, and can persist in the presence of substantial amounts of gene flow. 
The dashed line in (E) shows / = r/2, the maximum value of reproductive isolation attainable 
with one neutral DMI (Figure 1). The reproductive barrier among populations can persist in the 
presence of weak gene flow (see also, Figure 5 — figure supplement 2), and changes in the fitness of 
genotypes in the neutral network (Figure 5 — figure supplement 3). (F) Probability that a founder 
event causes a stochastic shift (Ps) from one stable equilibrium to the other decreases with r. A new 
population was founded by taking a sample of iVo = 2 individuals from a population at the green 
equilibrium, and was allowed to double in size every generation (A = 2, see Materials and methods). 
Ps was calculated based on 30 successful shifts to a red equilibrium. The shaded area shows the 
95% confidence interval of each data point. The dashed line shows 0.0003/r. Population genetic 
parameters: u = 10~ 3 per locus; genotypes are allowed to mutate at up to two loci per generation; 
up to four crossovers were allowed between two genotypes. For raw data, see data/fig_7/; for code, 
see ipython/fig_7.ipynb (Dryad: Paixao et al., 2014). 
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Table 1. The network properties of neutral networks influence the probability that they contain 
multiple stable equilibria. 



Network property 


p(SR) 


p(DA) 


z 


Algebraic connectivity 


0.926 


0.074 


-6.79 


Average degree 


0.851 


0.118 


-8.23 


Average Hamming distance 


-0.731 


0.064 


5.84 


Average shortest path length 


-0.947 


0.035 


6.76 


Degree assortativity (DA) 


0.021 


1.000 


-7.80 


Estrada index 


0.993 


-0.022 


-8.29 


Modularity 


-0.857 


-0.105 


8.87 


Spectral radius (SR) 


1.000 


0.021 


-8.98 


Variance in degree 


0.777 


-0.408 


-3.74 



The data are for the ensemble of 500 random networks with K = 9 genotypes of L = 6 diallelic 
loci (see Figure 3A, for more details). All the network statistics listed are associated with the 
probability that a random neutral network contains multiple stable equilibria, Pm (P < 0.001; 
Figure 3 — figure supplement 2). We conducted a separate logistic regression of Pm against each 
network statistic. The z statistic is the regression coefficient divided by its standard error. Most 
network statistics are strongly correlated with each other (see also Figure 3 — figure supplement 
3). jo(SR) and /?(DA) list the Spearman's rank correlation coefficient between the network statistic 
and the spectral radius (SR) and degree assortativity (DA), respectively. All correlations with 
\p\ > 0.4 are highly statistically significant (P < 10~ 10 ). For raw data, see data/tab_l/; for code, 
see ipython/tab.l.ipynb (Dryad: Paixao et al., 2014). 
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Table 2. Multiple neutral DMIs can generate strong barriers to gene flow. 



Scenario 


725 


b 


DMI with selection (Figure 2) 






s = 0.05 


1,556 


1.0271 


s = 0.1 


1,583 


1.0449 


s = 0.25 


1,632 


1.0772 


s = 0.5 


1,690 


1.1155 


s = 0.75 


1,742 


1.1498 


s = 1 


1,790 


1.1815 


Neutral network 






K = 6, L = 3 , a = 3 (Figure 5) 


1,516 


1.0007 


K = 11 , L = 5, a = 2 (Figure 7) 


1,797 


1.1861 



For each evolutionary scenario, we simulated a stepping-stone model with n = 20 populations and 
m = 0.025 between adjacent populations (see Materials and methods and Figure 6A). The rate 
of spread of a neutral allele at an unlinked locus across a reproductive barrier created by a set of 
DMIs was used to evaluate the strength of the barrier. Populations 1-10 and 11-20 are initialized at 
different stable equilibria. The populations are then allowed to evolve with gene flow until they reach 
a new equilibrium (Figure 6C-E). Then populations 1-10 and 11-20 are fixed for different neutral 
alleles at a locus unlinked to any of the fitness loci, and allowed to continue evolving. The genotype 
frequencies for the fitness loci remain at equilibrium, but the frequencies of the neutral alleles begin 
to evolve towards 0.5 (Figure 6B). T25 measures the time required for the frequency of one of the 
neutral alleles to increase from 0 to 25% in population 1.6 = T25/T/V measures the strength of the 
barrier to gene flow where T/v = 1515 is the T25 for a neutral allele at a single locus (Figure 6B). If 
b > 1, the reproductive barrier impedes the flow of a neutral allele. Population genetic parameters: 
u = 10 -3 per fitness locus and u = 0 for the neutral marker locus; genotypes are only allowed to 
mutate at one fitness locus per generation; there is free recombination (r = 0.5) between all loci. 
For raw data, see data/fig_2/, data/fig_5/ and data/fig_7/; for code, see ipython/fig_7.ipynb (Dryad: 
Paixao et al, 2014). 
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Figure 1 — figure supplement 1. A neutral DMI between two diallelic loci (Figure 1A) only 
contains one stable equilibrium. (A) The frequencies of the three genotypes (ab, Ab and aB) in 
a population are represented as a single point in a ternary plot. (B) Evolutionary trajectories 
without recombination (r = 0) of populations starting at initial frequencies such that at least one of 
the genotypes is absent (i.e., the edges of the triangle). All trajectories converge to a single stable 

equilibrium (solid circle) with frequencies pAb = PaB = 1 — 2 an< ^ P ab = ^ ~ ^- Arrowheads 
mark the genotype frequencies after 100 generations of evolution. (C— D) Evolutionary trajectories 
with recombination rates of r = 0.01 and 0.1, respectively. As the recombination rate increases, the 
frequency of ab at equilibrium (p a b) increases, and populations approach equilibrium more quickly. 
Population genetic parameters: u = 10~ 3 per locus per generation; genotypes are only allowed to 
mutate at one locus per generation. For code, see ipython/fig_l_sl.ipynb (Dryad: Paixao et al., 
2014). 



31 



Downloaded from http://biorxiv.org/on September 18, 2014 




567896 7 8 9 6 7 8 9 

Number of genotypes Number of genotypes Number of genotypes 



Figure 3 — figure supplement 1. Larger neutral networks are more likely to contain multiple 
stable equilibria. (A) Probability that random neutral networks containing K genotypes of L loci 
with a alleles contain multiple stable equilibria (Pm)- Expected genetic differentiation (B) and 
degree of reproductive isolation (C) between populations at different equilibria. The data for a = 2 
alleles are the same as shown in Figure 3. See legend of Figure 3 for more details. For raw data, 
see data/ng_3_sl; for code, see ipython/fig_3_sl.ipynb (Dryad: Paixao et al., 2014). 
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Figure 3 — figure supplement 2. Network properties of neutral networks influence the probabil- 
ity that they contain multiple stable equilibria. To illustrate this, we show the relationship between 
two network properties and the probability that random neutral networks contain multiple stable 
equilibria (Pm) (see Materials and methods for more details). (A) Pm is negatively related to the 
spectral radius of the adjacency matrix. The spectral radius is strongly correlated to several other 
network properties, including the algebraic connectivity, average degree, and modularity (Figure 
3 — figure supplement 3; Table 1). (B) Pm is negatively related to the degree assortativity. The 
degree assortativity is moderately correlated with the variance in degree of the neutral network 
(Table 1). The data are for the ensemble of 500 random networks with K = 9 genotypes of L = 6 
diallelic loci that, overall, show Pm = 50% (Figure 3A). Similar trends are observed for other val- 
ues of K, L, and a. Values are probabilities and 95% confidence intervals for data grouped into 
approximately equal sized bins based on the network statistic. Solid lines show logistic regression 
fits to the raw data (see Table 1 for more details). For raw data, see data/tab_l; for code, see 
ipython/fig_3_s2.ipynb (Dryad: Paixao et al., 2014). 
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Figure 3 — figure supplement 3. Network properties of neutral networks are correlated with 
each other. Network statistics (see Materials and methods): PL, average shortest path length; SR, 
spectral radius; DA, degree assortativity; AC, algebraic connectivity; Q, modularity. The data are 
the same as analysed in Table 1 and Figure 3 — figure supplement 2. The diagonal shows kernel 
density and rug plots for each statistic. Red lines show locally weighted polynomial regression fits. 
All network statistics are strongly associated with Pm- For raw data, see data/tab_l; for code, see 
ipython/fig_3_s3.ipynb (Dryad: Paixao et al., 2014). 
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Figure 3 — figure supplement 4. Whether or not a neutral network shows multiple stable 
equilibria depends on the precise pattern of recombination between sites and cannot be strictly 
predicted from the topology of the network. (A) Neutral network of K = 6 genotypes generated 
by incompatibilities between L = 5 loci. The shortest path length between two genotypes in the 
network is the same as the Hamming distance between them. (B) The neutral network shown in 
(A) shows a single stable equilibrium when the recombination rate is high relative to the mutation 
rate: r = = 0-125 and u = r/20 = 0.00625. (C) The genotype network derived from that 

shown in (A) by inverting the D and E loci and inserting them between the A and B loci has the 
same topology and Hamming distances between genotypes as that in (A), but shows two stable 
equilibria for the same values of r and u as in (B). For raw data, see data/fig_3_s4; for code, see 
ipython/fig_3_s4.ipynb (Dryad: Paixao et al., 2014). 
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Figure 5 — figure supplement 1. The critical point at which the equilibria bifurcate is approx- 
imately invariant with the ratio between the recombination rate (r) and the mutation rate (it). 
We calculated the critical recombination rate (r cr it) for different values of it (spanning 3 orders of 
magnitude) for the neutral network shown in Figure 5A. We did this by finding the point where the 
symmetric equilibrium (in green in Figures 5B and 5C) changes from stable to unstable. Population 
genetic parameters: genotypes are allowed to mutate at all loci per generation; up to two crossovers 
were allowed between two genotypes. For code, see ipython/fig_5_sl.ipynb (Dryad: Paixao et al., 
2014). 
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Figure 5 — figure supplement 2. The reproductive barriers created by multiple neutral DMIs 
are robust to gene flow. Maximum migration rate between two populations that allows significant 
genetic differentiation between them to persist for different recombination rates. The analysis was 
carried out for the neutral networks shown in Figure 5 A (blue) and Figure 7A (red). Population 
genetic parameters: u = 10 -3 per locus; genotypes are allowed to mutate at up to two loci per 
generation; up to L— 1 crossovers were allowed between two genotypes. For raw data, see data/fig_5/ 
and data/fig_7/; for code, see ipython/fig_5_s2.ipynb (Dryad: Paixao et al., 2014). 
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Figure 5 — figure supplement 3. The reproductive barriers created by multiple neutral DMIs 
are robust to changes in the fitness of genotypes in the neutral network. The neutral network 
model assumes that "in-network" genotypes have a fitness of it; = 1. Values show the maximum 
selection coefficient by which the fitness of a genotype i in the neutral network can be increased 
(wi = 1 + Si) while maintaining the existence of at least two stable equilibria. The analysis was 
carried out for the neutral networks shown in Figure 5A (A) and Figure 7A (B). The dashed lines 
show s = r. Population genetic parameters: u = 10~ 3 per locus; genotypes are allowed to mutate at 
up to two loci per generation; up to L — 1 crossovers were allowed between two genotypes. For raw 
data, see data/fig_5/ and data/fig_7/; for code, see ipython/fig_5_s3.ipynb (Dryad: Paixao et al., 
2014). 
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Figure 5 — figure supplement 4. Partial DMIs can also cause emergent speciation. The neutral 
network model assumes that "out-of- network" genotypes have a fitness of u> ou t = 0. Here we 
measure the effect of changing u> ou t on the strength of the reproductive barriers evolving on the 
neutral network in Figure 5A. (A) The amount of genetic differentiation among populations at 
the two stable equilibria remains stable for w 0 ut ^ 0.7. (B) The degree of reproductive isolation 
among populations at the two stable equilibria changes as Io(l — w ou t) where Iq is the reproductive 
isolation when w out = 0 (i.e., the default model). This is because higher values of w out imply higher 
hybrid fitness. Population genetic parameters: u = 10 -3 per locus; genotypes are allowed to mutate 
at all loci per generation; up to two crossovers were allowed between two genotypes. For code, see 
ipython/fig_5_s4.ipynb (Dryad: Paixao et al., 2014). 
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